Al Wusta Governorate
Deep Neural Networks Inspired by Differential Equations
Liu, Yongshuai, Wang, Lianfang, Qin, Kuilin, Zhang, Qinghua, Wang, Faqiang, Cui, Li, Liu, Jun, Duan, Yuping, Zeng, Tieyong
Deep learning has become a pivotal technology in fields such as computer vision, scientific computing, and dynamical systems, significantly advancing these disciplines. However, neural Networks persistently face challenges related to theoretical understanding, interpretability, and generalization. To address these issues, researchers are increasingly adopting a differential equations perspective to propose a unified theoretical framework and systematic design methodologies for neural networks. In this paper, we provide an extensive review of deep neural network architectures and dynamic modeling methods inspired by differential equations. We specifically examine deep neural network models and deterministic dynamical network constructs based on ordinary differential equations (ODEs), as well as regularization techniques and stochastic dynamical network models informed by stochastic differential equations (SDEs). We present numerical comparisons of these models to illustrate their characteristics and performance. Finally, we explore promising research directions in integrating differential equations with deep learning to offer new insights for developing intelligent computational methods that boast enhanced interpretability and generalization capabilities.
- Asia > China > Hong Kong (0.04)
- Asia > China > Beijing > Beijing (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- (8 more...)
- Overview (1.00)
- Research Report (0.82)
HePGA: A Heterogeneous Processing-in-Memory based GNN Training Accelerator
Ogbogu, Chukwufumnanya, Narang, Gaurav, Joardar, Biresh Kumar, Doppa, Janardhan Rao, Chakrabarty, Krishnendu, Pande, Partha Pratim
Processing-In-Memory (PIM) architectures offer a promising approach to accelerate Graph Neural Network (GNN) training and inference. However, various PIM devices such as ReRAM, FeFET, PCM, MRAM, and SRAM exist, with each device offering unique trade-offs in terms of power, latency, area, and non-idealities. A heterogeneous manycore architecture enabled by 3D integration can combine multiple PIM devices on a single platform, to enable energy-efficient and high-performance GNN training. In this work, we propose a 3D heterogeneous PIM-based accelerator for GNN training referred to as HePGA. We leverage the unique characteristics of GNN layers and associated computing kernels to optimize their mapping on to different PIM devices as well as planar tiers. Our experimental analysis shows that HePGA outperforms existing PIM-based architectures by up to 3.8x and 6.8x in energy-efficiency (TOPS/W) and compute efficiency (TOPS/mm2) respectively, without sacrificing the GNN prediction accuracy. Finally, we demonstrate the applicability of HePGA to accelerate inferencing of emerging transformer models.
- Asia > Middle East > Oman > Al Wusta Governorate > Haima (0.04)
- North America > United States > Washington > Whitman County > Pullman (0.04)
- North America > United States > Virginia (0.04)
- (5 more...)
Comparative Analysis of the Land Use and Land Cover Changes in Different Governorates of Oman using Spatiotemporal Multi-spectral Satellite Data
Shafi, Muhammad, Bokhari, Syed Mohsin
Land cover and land use (LULC) changes are key applications of satellite imagery, and they have critical roles in resource management, urbanization, protection of soils and the environment, and enhancing sustainable development. The literature has heavily utilized multispectral spatiotemporal satellite data alongside advanced machine learning algorithms to monitor and predict LULC changes. This study analyzes and compares LULC changes across various governorates (provinces) of the Sultanate of Oman from 2016 to 2021 using annual time steps. For the chosen region, multispectral spatiotemporal data were acquired from the open-source Sentinel-2 satellite dataset. Supervised machine learning algorithms were used to train and classify different land covers, such as water bodies, crops, urban, etc. The constructed model was subsequently applied within the study region, allowing for an effective comparative evaluation of LULC changes within the given timeframe.
- Asia > Middle East > Oman > Muscat Governorate > Muscat (0.05)
- Asia > Middle East > Oman > Ad Dakhiliyah Governorate > Nizwa (0.05)
- Asia > Middle East > Oman > Al Buraimi Governorate > Al-Buraimi (0.05)
- (16 more...)
- Food & Agriculture > Agriculture (1.00)
- Law > Real Estate Law (0.70)
- Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.40)
Atleus: Accelerating Transformers on the Edge Enabled by 3D Heterogeneous Manycore Architectures
Dhingra, Pratyush, Doppa, Janardhan Rao, Pande, Partha Pratim
Transformer architectures have become the standard neural network model for various machine learning applications including natural language processing and computer vision. However, the compute and memory requirements introduced by transformer models make them challenging to adopt for edge applications. Furthermore, fine-tuning pre-trained transformers (e.g., foundation models) is a common task to enhance the model's predictive performance on specific tasks/applications. Existing transformer accelerators are oblivious to complexities introduced by fine-tuning. In this paper, we propose the design of a three-dimensional (3D) heterogeneous architecture referred to as Atleus that incorporates heterogeneous computing resources specifically optimized to accelerate transformer models for the dual purposes of fine-tuning and inference. Specifically, Atleus utilizes non-volatile memory and systolic array for accelerating transformer computational kernels using an integrated 3D platform. Moreover, we design a suitable NoC to achieve high performance and energy efficiency. Finally, Atleus adopts an effective quantization scheme to support model compression. Experimental results demonstrate that Atleus outperforms existing state-of-the-art by up to 56x and 64.5x in terms of performance and energy efficiency respectively
- Asia > Middle East > Oman > Al Wusta Governorate > Haima (0.06)
- North America > United States > Washington > Whitman County > Pullman (0.04)
- North America > United States > Oregon > Benton County > Corvallis (0.04)
- (4 more...)
A Survey: Collaborative Hardware and Software Design in the Era of Large Language Models
Guo, Cong, Cheng, Feng, Du, Zhixu, Kiessling, James, Ku, Jonathan, Li, Shiyu, Li, Ziru, Ma, Mingyuan, Molom-Ochir, Tergel, Morris, Benjamin, Shan, Haoxuan, Sun, Jingwei, Wang, Yitu, Wei, Chiyue, Wu, Xueying, Wu, Yuhao, Yang, Hao Frank, Zhang, Jingyang, Zhang, Junyao, Zheng, Qilin, Zhou, Guanglei, Hai, null, Li, null, Chen, Yiran
The rapid development of large language models (LLMs) has significantly transformed the field of artificial intelligence, demonstrating remarkable capabilities in natural language processing and moving towards multi-modal functionality. These models are increasingly integrated into diverse applications, impacting both research and industry. However, their development and deployment present substantial challenges, including the need for extensive computational resources, high energy consumption, and complex software optimizations. Unlike traditional deep learning systems, LLMs require unique optimization strategies for training and inference, focusing on system-level efficiency. This paper surveys hardware and software co-design approaches specifically tailored to address the unique characteristics and constraints of large language models. This survey analyzes the challenges and impacts of LLMs on hardware and algorithm research, exploring algorithm optimization, hardware design, and system-level innovations. It aims to provide a comprehensive understanding of the trade-offs and considerations in LLM-centric computing systems, guiding future advancements in AI. Finally, we summarize the existing efforts in this space and outline future directions toward realizing production-grade co-design methodologies for the next generation of large language models and AI systems.
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > North Carolina > Durham County > Durham (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- (19 more...)
- Overview (1.00)
- Research Report > Promising Solution (0.67)
- Information Technology (0.93)
- Health & Medicine (0.67)
Dynamic Adaptive Optimization for Effective Sentiment Analysis Fine-Tuning on Large Language Models
Ding, Hongcheng, Zhao, Xuanze, Abdullah, Shamsul Nahar, Dewi, Deshinta Arrova, Jiang, Zixiao
Sentiment analysis plays a crucial role in various domains, such as business intelligence and financial forecasting. Large language models (LLMs) have become a popular paradigm for sentiment analysis, leveraging multi-task learning to address specific tasks concurrently. However, LLMs with fine-tuning for sentiment analysis often underperforms due to the inherent challenges in managing diverse task complexities. Moreover, constant-weight approaches in multi-task learning struggle to adapt to variations in data characteristics, further complicating model effectiveness. To address these issues, we propose a novel multi-task learning framework with a dynamic adaptive optimization (DAO) module. This module is designed as a plug-and-play component that can be seamlessly integrated into existing models, providing an effective and flexible solution for multi-task learning. The key component of the DAO module is dynamic adaptive loss, which dynamically adjusts the weights assigned to different tasks based on their relative importance and data characteristics during training. Sentiment analyses on a standard and customized financial text dataset demonstrate that the proposed framework achieves superior performance. Specifically, this work improves the Mean Squared Error (MSE) and Accuracy (ACC) by 15.58% and 1.24% respectively, compared with previous work.
- Asia > China (0.04)
- North America > United States (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (4 more...)
- Banking & Finance > Trading (0.93)
- Health & Medicine > Therapeutic Area > Immunology (0.68)
- Information Technology > Services (0.68)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
HeTraX: Energy Efficient 3D Heterogeneous Manycore Architecture for Transformer Acceleration
Dhingra, Pratyush, Doppa, Janardhan Rao, Pande, Partha Pratim
Subsequently, the feed-forward Transformers have revolutionized deep learning and generative (FF) network is employed, which includes multiplication with the modeling to enable unprecedented advancements in natural trainable weights. The end-to-end transformer model also consists language processing tasks and beyond. However, designing of additional computations such as softmax, layer-normalization, hardware accelerators for executing transformer models is activation function, positional encoding, etc. These computational challenging due to the wide variety of computing kernels involved kernels give rise to the heterogeneity of operations in the in the transformer architecture. Existing accelerators are either transformer architecture. Recently, processing-in-memory (PIM) inadequate to accelerate end-to-end transformer models or suffer has emerged as a promising approach to accelerate the notable thermal limitations. In this paper, we propose the design of training/inference of deep neural networks (DNNs) [2]. Emerging a three-dimensional heterogeneous architecture referred to as resistive random-access memory (ReRAM)-based PIM HeTraX specifically optimized to accelerate end-to-end architectures can achieve higher performance and better energy transformer models. HeTraX employs hardware resources aligned efficiency than GPU-based counterparts [2].
- Asia > Middle East > Oman > Al Wusta Governorate > Haima (0.07)
- North America > United States > California > Orange County > Newport Beach (0.04)
- North America > United States > Washington > Whitman County > Pullman (0.04)
- North America > United States > New York > New York County > New York City (0.04)
ARTEMIS: A Mixed Analog-Stochastic In-DRAM Accelerator for Transformer Neural Networks
Afifi, Salma, Thakkar, Ishan, Pasricha, Sudeep
Transformers have emerged as a powerful tool for natural language processing (NLP) and computer vision. Through the attention mechanism, these models have exhibited remarkable performance gains when compared to conventional approaches like recurrent neural networks (RNNs) and convolutional neural networks (CNNs). Nevertheless, transformers typically demand substantial execution time due to their extensive computations and large memory footprint. Processing in-memory (PIM) and near-memory computing (NMC) are promising solutions to accelerating transformers as they offer high compute parallelism and memory bandwidth. However, designing PIM/NMC architectures to support the complex operations and massive amounts of data that need to be moved between layers in transformer neural networks remains a challenge. We propose ARTEMIS, a mixed analog-stochastic in-DRAM accelerator for transformer models. Through employing minimal changes to the conventional DRAM arrays, ARTEMIS efficiently alleviates the costs associated with transformer model execution by supporting stochastic computing for multiplications and temporal analog accumulations using a novel in-DRAM metal-on-metal capacitor. Our analysis indicates that ARTEMIS exhibits at least 3.0x speedup, 1.8x lower energy, and 1.9x better energy efficiency compared to GPU, TPU, CPU, and state-of-the-art PIM transformer hardware accelerators.
uTRAND: Unsupervised Anomaly Detection in Traffic Trajectories
D'Amicantonio, Giacomo, Bondarau, Egor, de With, Peter H. N.
Deep learning-based approaches have achieved significant improvements on public video anomaly datasets, but often do not perform well in real-world applications. This paper addresses two issues: the lack of labeled data and the difficulty of explaining the predictions of a neural network. To this end, we present a framework called uTRAND, that shifts the problem of anomalous trajectory prediction from the pixel space to a semantic-topological domain. The framework detects and tracks all types of traffic agents in bird's-eye-view videos of traffic cameras mounted at an intersection. By conceptualizing the intersection as a patch-based graph, it is shown that the framework learns and models the normal behaviour of traffic agents without costly manual labeling. Furthermore, uTRAND allows to formulate simple rules to classify anomalous trajectories in a way suited for human interpretation. We show that uTRAND outperforms other state-of-the-art approaches on a dataset of anomalous trajectories collected in a real-world setting, while producing explainable detection results.
- Europe > Netherlands > North Brabant > Eindhoven (0.05)
- North America > United States > Nevada > Clark County > Las Vegas (0.04)
- Europe > Switzerland (0.04)
- (3 more...)
Defining Digital Quadruplets in the Cyber-Physical-Social Space for Parallel Driving
Liu, Teng, Xing, Yang, Chen, Long, Cao, Dongpu, Wang, Fei-Yue
Parallel driving is a novel framework to synthesize vehicle intelligence and transport automation. This article aims to define digital quadruplets in parallel driving. In the cyber-physical-social systems (CPSS), based on the ACP method, the names of the digital quadruplets are first given, which are descriptive, predictive, prescriptive and real vehicles. The objectives of the three virtual digital vehicles are interacting, guiding, simulating and improving with the real vehicles. Then, the three virtual components of the digital quadruplets are introduced in detail and their applications are also illustrated. Finally, the real vehicles in the parallel driving system and the research process of the digital quadruplets are depicted. The presented digital quadruplets in parallel driving are expected to make the future connected automated driving safety, efficiently and synergistically.
- Asia > China > Shandong Province > Qingdao (0.04)
- Asia > China > Beijing > Beijing (0.04)
- North America > United States > New York (0.04)
- (2 more...)
- Transportation > Ground > Road (1.00)
- Automobiles & Trucks (1.00)
- Information Technology (0.88)